Artificial neural networks are functions depending on a finite number of parameters typically encoded as weights and biases. The identification of the parameters of the network from finite samples of input-output pairs is often referred to as the \emph{teacher-student model}, and this model has represented a popular framework for understanding training and generalization. Even if the problem is NP-complete in the worst case, a rapidly growing literature -- after adding suitable distributional assumptions -- has established finite sample identification of two-layer networks with a number of neurons $m=\mathcal O(D)$, $D$ being the input dimension. For the range $D<m<D^2$ the problem becomes harder, and truly little is known for networks parametrized by biases as well. This paper fills the gap by providing constructive methods and theoretical guarantees of finite sample identification for such wider shallow networks with biases. Our approach is based on a two-step pipeline: first, we recover the direction of the weights, by exploiting second order information; next, we identify the signs by suitable algebraic evaluations, and we recover the biases by empirical risk minimization via gradient descent. Numerical results demonstrate the effectiveness of our approach.
translated by 谷歌翻译
Artificial neural networks can learn complex, salient data features to achieve a given task. On the opposite end of the spectrum, mathematically grounded methods such as topological data analysis allow users to design analysis pipelines fully aware of data constraints and symmetries. We introduce a class of persistence-based neural network layers. Persistence-based layers allow the users to easily inject knowledge about symmetries (equivariance) respected by the data, are equipped with learnable weights, and can be composed with state-of-the-art neural architectures.
translated by 谷歌翻译
We consider the problem of two active particles in 2D complex flows with the multi-objective goals of minimizing both the dispersion rate and the energy consumption of the pair. We approach the problem by means of Multi Objective Reinforcement Learning (MORL), combining scalarization techniques together with a Q-learning algorithm, for Lagrangian drifters that have variable swimming velocity. We show that MORL is able to find a set of trade-off solutions forming an optimal Pareto frontier. As a benchmark, we show that a set of heuristic strategies are dominated by the MORL solutions. We consider the situation in which the agents cannot update their control variables continuously, but only after a discrete (decision) time, $\tau$. We show that there is a range of decision times, in between the Lyapunov time and the continuous updating limit, where Reinforcement Learning finds strategies that significantly improve over heuristics. In particular, we discuss how large decision times require enhanced knowledge of the flow, whereas for smaller $\tau$ all a priori heuristic strategies become Pareto optimal.
translated by 谷歌翻译
Token free approaches have been successfully applied to a series of word and span level tasks. In this work, we compare a byte-level (ByT5) and a wordpiece based (mT5) sequence to sequence model on the 51 languages of the MASSIVE multilingual semantic parsing dataset. We examine multiple experimental settings: (i) zero-shot, (ii) full gold data and (iii) zero-shot with synthetic data. By leveraging a state-of-the-art label projection method for machine translated examples, we are able to reduce the gap in exact match accuracy to only 5 points with respect to a model trained on gold data from all the languages. We additionally provide insights on the cross-lingual transfer of ByT5 and show how the model compares with respect to mT5 across all parameter sizes.
translated by 谷歌翻译
用于在线状态估计的随机过滤器是自治系统的核心技术。此类过滤器的性能是系统能力的关键限制因素之一。此类过滤器的渐近行为(例如,用于常规操作)和瞬态响应(例如,对于快速初始化和重置)对于保证自主系统的稳健操作至关重要。本文使用n个方向测量值(包括车身框架和参考框架方向类型测量值)引入了陀螺仪辅助姿态估计器的新通用公式。该方法基于一种集成状态公式,该公式结合了导航,所有方向传感器的外部校准以及在单个模棱两可的几何结构中的陀螺式偏置状态。这种新提出的对称性允许模块化的不同方向测量及其外部校准,同时保持在同一对称性中包括偏置态的能力。随后使用此对称性的基于滤波器的估计量明显改善了瞬态响应,与最新方法相比,渐近偏置和外部校准估计。估计器在统计代表性的模拟中得到了验证,并在现实世界实验中进行了测试。
translated by 谷歌翻译
由于时空事件发生的随机性,在报告的交通中断开始时缺乏信息,并且缺乏运输工程的高级方法来从过去中获得见解,因此预测交通事故持续时间是一个难题事故。本文提出了一个新的Fusion框架,用于通过将机器学习与交通流量/速度和事件描述作为功能进行集成来预测有限信息的事件持续时间,并通过多种深度​​学习方法编码(ANN AUTOCONEDER和角色级别的LSTM-ANN情绪分类器)。该论文在运输和数据科学中构建了跨学科建模方法。该方法提高了适用于基线事件报告的最佳表现ML模型的入射持续时间预测准确性。结果表明,与标准线性或支持矢量回归模型相比,我们提出的方法可以提高准确性$ 60 \%$,并且相对于混合深度学习自动编码的GBDT模型的另外7美元\%$改进,这似乎胜过表现所有其他模型。应用区是旧金山市,富含交通事件日志(全国交通事故数据集)和过去的历史交通拥堵信息(Caltrans绩效测量系统的5分钟精度测量)。
translated by 谷歌翻译
基于惯性数据的人类活动识别(HAR)是从智能手机到超低功率传感器的嵌入式设备上越来越扩散的任务。由于深度学习模型的计算复杂性很高,因此大多数嵌入式HAR系统基于简单且不那么精确的经典机器学习算法。这项工作弥合了在设备上的HAR和深度学习之间的差距,提出了一组有效的一维卷积神经网络(CNN),可在通用微控制器(MCUS)上部署。我们的CNN获得了将超参数优化与子字节和混合精确量化的结合,以在分类结果和记忆职业之间找到良好的权衡。此外,我们还利用自适应推断作为正交优化,以根据处理后的输入来调整运行时的推理复杂性,从而产生更灵活的HAR系统。通过在四个数据集上进行实验,并针对超低功率RISC-V MCU,我们表明(i)我们能够为HAR获得一组丰富的帕累托(Pareto)最佳CNN,以范围超过1个数量级记忆,潜伏期和能耗; (ii)由于自适应推断,我们可以从单个CNN开始得出> 20个运行时操作模式,分类分数的不同程度高达10%,并且推理复杂性超过3倍,并且内存开销有限; (iii)在四个基准中的三个基准中,我们的表现都超过了所有以前的深度学习方法,将记忆占用率降低了100倍以上。获得更好性能(浅层和深度)的少数方法与MCU部署不兼容。 (iv)我们所有的CNN都与推理延迟<16ms的实时式evice Har兼容。他们的记忆职业在0.05-23.17 kb中有所不同,其能源消耗为0.005和61.59 UJ,可在较小的电池供应中进行多年的连续操作。
translated by 谷歌翻译
为了了解强化学习的安全威胁(RL)算法,本文研究中毒攻击以操纵\ emph {any}订单 - 最佳学习算法对偶发性RL中有针对性的政策,并研究了两种自然中毒攻击的潜在损害,即,\ emph {Reward}和\ Emph {Action}的操作。我们发现攻击的影响至关重要地取决于奖励是有界还是无限的。在有限的奖励设置中,我们表明只有奖励操纵或只有动作操纵不能保证成功的攻击。但是,通过结合奖励和行动操纵,对手可以操纵任何订单最佳学习算法,以遵循任何有针对性的策略,并使用$ \ tilde {\ theta}(\ sqrt {t})$总攻击成本,这是订单 - 优越,最佳的攻击成本不知道基础MDP。相反,在无限的奖励设置中,我们表明奖励操纵攻击足以使对手成功操纵任何订单最佳学习算法,以使用$ \ tilde {o}(\ sqrt {t})遵循任何有针对性的策略污染。我们的结果揭示了有关中毒攻击无法获得或无法实现的有用见解,并将刺激有关强大RL算法设计的更多作品。
translated by 谷歌翻译
现代的天空调查正在产生大量的观测数据,这使经典方法的应用用于分类和分析对象具有挑战性和耗时的。但是,使用自动机器和深度学习方法可能会大大减轻此问题。我们提出了一种新的深度学习工具Ulisse,它从单个原型对象开始,能够识别具有相同形态和光度特性的对象,因此可以创建候选苏西亚列表。在这项工作中,我们专注于在斯隆数字天空调查的星系样本中应用方法来检测AGN候选物,因为光带中主动银河系核(AGN)的鉴定和分类仍然是外层术天文学的挑战性任务。乌里斯(Ulisse)旨在初步探索大型天空调查,直接使用从图像网数据集提取的功能来执行相似性搜索。该方法能够快速识别仅从给定原型的单个图像开始的候选人列表,而无需任何耗时的神经网络训练。我们的实验表明,乌里斯(Ulisse)能够根据宿主星系形态,颜色和中央核源的存在的结合来鉴定AGN候选物,检索效率从21%到65%(包括复合源)(包括复合源),这是基于宿主的候选者。随机猜测基线为12%。我们发现,与具有螺旋形或晚期特性的原型相反,Ulisse在早期型宿主星系中检索AGN最有效。根据这项工作中描述的结果,Ulisse可以是在当前和未来的广阔田野调查(例如欧几里得,LSST等)中选择不同类型的天体物理对象的有前途的工具,该工具每晚都针对数百万个来源。
translated by 谷歌翻译
从稀疏的原始数据中生成密集的点云使下游3D理解任务,但现有模型仅限于固定的上采样率或短范围的整数值。在本文中,我们提出了APU-SMOG,这是一种基于变压器的模型,用于任意点云上采样(APU)。首先将稀疏输入映射到高斯(烟雾)分布的球形混合物,从中可以采样任意数量的点。然后,将这些样品作为查询馈送到变压器解码器,将它们映射回目标表面。广泛的定性和定量评估表明,APU-SMOG的表现优于最先进的固定比例方法,同时使用任何缩放因子(包括非直觉值)有效地启用了以单个训练有素的模型来提高采样。该代码将可用。
translated by 谷歌翻译